Mimicking Word Embeddings using Subword RNNs

机译：使用subword RNN模仿Word嵌入

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word embeddings improve generalization over lexical features by placing eachword in a lower-dimensional space, using distributional information obtainedfrom unlabeled data. However, the effectiveness of word embeddings fordownstream NLP tasks is limited by out-of-vocabulary (OOV) words, for whichembeddings do not exist. In this paper, we present MIMICK, an approach togenerating OOV word embeddings compositionally, by learning a function fromspellings to distributional embeddings. Unlike prior work, MIMICK does notrequire re-training on the original word embedding corpus; instead, learning isperformed at the type level. Intrinsic and extrinsic evaluations demonstratethe power of this simple approach. On 23 languages, MIMICK improves performanceover a word-based baseline for tagging part-of-speech and morphosyntacticattributes. It is competitive with (and complementary to) a supervisedcharacter-based model in low-resource settings.

机译：单词嵌入通过使用从未标记数据获得的分布信息，将每个单词放在较低维度的空间中，从而改善了词法特征的泛化能力。但是，下游NLP任务中词嵌入的有效性受到词汇外（OOV）词的限制，而后者不存在嵌入。在本文中，我们介绍了MIMICK，这是一种通过学习从拼写到分布嵌入的函数来组合生成OOV单词嵌入的方法。与以前的工作不同，MIMICK不需要对原始词嵌入语料库进行重新培训；相反，学习是在类型级别上进行的。内部和外部评估证明了这种简单方法的强大功能。在23种语言上，MIMICK改进了基于词的基线的性能，用于标记词性和语态句法属性。在资源匮乏的环境中，它与基于监督角色的模型竞争（并且互补）。

著录项

作者
Pinter, Yuval; Guthrie, Robert; Eisenstein, Jacob;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类
入库时间 2022-08-20 21:10:34

相似文献

外文文献
中文文献
专利

1. Improved image captioning with subword units training and transformer [J] . Cai Qiang, Li Jing, Li Haisheng, 高技术通讯（英文版） . 2020,第002期
2. Explicit song lyrics detection with subword-enriched word embeddings [J] . Rospocher Marco Expert systems with applications . 2021,第Jana期

机译：用语中富有单词嵌入的显式歌曲歌词检测
3. Subword conditions and subword histories [J] . Salomaa A, Yu S Information and computation . 2006,第12期

机译：子词条件和子词历史
4. Mimicking Word Embeddings using Subword RNNs [C] . Yuval Pinter, Robert Guthrie, Jacob Eisenstein Conference on empirical methods in natural language processing . 2017

机译：使用子词RNN模仿词嵌入
5. Multilingual model using cross-lingual word embeddings based on subword alignment and cross-task projection利用統計を見る [D] . Sakuma Jin 2019

机译：使用基于子词对齐和跨任务投影的跨语言词嵌入的多语言模型
6. BioWordVec improving biomedical word embeddings with subword information and MeSH [O] . Yijia Zhang, Qingyu Chen, Zhihao Yang, 2019

机译：BioWordVec通过子词信息和MeSH改善生物医学词嵌入
7. Treat the Word As a Whole or Look Inside? Subword Embeddings Model Language Change and Typology [O] . Yang Xu, Jiasheng Zhang, David Reitter 2019

机译：把这个词整体视为子字嵌入式模型语言变更和类型化
8. Modeling words with subword units in an articulatorily constrained speech recognition algorithm [R] . Hogden, J. 1997

机译：在语音约束语音识别算法中用子词单元建模单词

Mimicking Word Embeddings using Subword RNNs

摘要

著录项

相似文献

相关主题

期刊订阅